Learning Adversarial Markov Decision Processes with Delayed Feedback

نویسندگان

چکیده

Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) is observed delay. This paper studies online episodic Markov decision processes (MDPs) with unknown transitions, adversarially changing costs and unrestricted delayed feedback. That is, the trajectory of episode k are revealed to learner only end k+d?, where delays d? neither identical nor bounded, chosen by an oblivious adversary. We present novel algorithms based on policy optimization achieve near-optimal high-probability regret (K+D)¹?² under full-information feedback, K number episodes D=?? total Under bandit we prove similar assuming stochastic, (K+D)²?³ general case. first consider minimization important setting MDPs

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online model learning in adversarial Markov decision processes

Consider, for example, the well-known game of Roshambo (Figure 1), or rock-paper-scissors, in which two players select one of three actions simultaneously. One may know that the adversary will base its next action on some bounded sequence of the past joint actions, but may be unaware of its exact strategy. For example, one may notice that every time it selects P , the adversary selects S in the...

متن کامل

Learning Qualitative Markov Decision Processes Learning Qualitative Markov Decision Processes

To navigate in natural environments, a robot must decide the best action to take according to its current situation and goal, a problem that can be represented as a Markov Decision Process (MDP). In general, it is assumed that a reasonable state representation and transition model can be provided by the user to the system. When dealing with complex domains, however, it is not always easy or pos...

متن کامل

Learning Markov Decision Processes for Model Checking

Constructing an accurate system model for formal model verification can be both resource demanding and time-consuming. To alleviate this shortcoming, algorithms have been proposed for automatically learning system models based on observed system behaviors. In this paper we extend the algorithm on learning probabilistic automata to reactive systems, where the observed system behavior is in the f...

متن کامل

Principled Option Learning in Markov Decision Processes

It is well known that options can make planning more efficient, among their many benefits. Thus far, algorithms for autonomously discovering a set of useful options were heuristic. Naturally, a principled way of finding a set of useful options may be more promising and insightful. In this paper we suggest a mathematical characterization of good sets of options using tools from information theor...

متن کامل

Reinforcement Learning and Markov Decision Processes

Situated in between supervised learning and unsupervised learning, the paradigm of reinforcement learning deals with learning in sequential decision making problems in which there is limited feedback. This text introduces the intuitions and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming. Fir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i7.20690